{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wlQ50Lgq0Dkf"
   },
   "source": [
    "Load the data and divide into training and test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "9I30gKInplA-"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\"https://data.heatonresearch.com/data/t81-558/datasets/series-31-num.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "RygSA-fwp8tz"
   },
   "outputs": [],
   "source": [
    "df_train = df[df['time']<3000]\n",
    "df_test = df[df['time']>=3000]\n",
    "\n",
    "seq_train = df_train['value'].tolist()\n",
    "seq_test = df_test['value'].tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hmLJyDXC0PWP"
   },
   "source": [
    "See size of each:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OAspGDiO0Vf2",
    "outputId": "67cf3ed3-fe92-4060-e0bf-9df27cef03ef"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Size of test: 1000\n"
     ]
    }
   ],
   "source": [
    "print(f\"Size of test: {len(seq_test)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "-k5dV206qihB"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "def to_sequences(seq_size, obs):\n",
    "    x = []\n",
    "    y = []\n",
    "    for i in range(len(obs)-SEQUENCE_SIZE):\n",
    "        window = obs[i:(i+SEQUENCE_SIZE)]\n",
    "        after_window = obs[i+SEQUENCE_SIZE]\n",
    "        window = [[x] for x in window]\n",
    "        x.append(window)\n",
    "        y.append(after_window)\n",
    "        \n",
    "    return np.array(x),np.array(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "JQ5CIJJl02p6"
   },
   "source": [
    "Now divide into sequences."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "T5op9b4SqOmf"
   },
   "outputs": [],
   "source": [
    "SEQUENCE_SIZE = 5\n",
    "x_train,y_train = to_sequences(SEQUENCE_SIZE,seq_train)\n",
    "x_test,y_test = to_sequences(SEQUENCE_SIZE,seq_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-KqZg2NouwBy",
    "outputId": "64a0fed5-bc4b-4dea-f485-9cf802f96145"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(995, 5, 1)"
      ]
     },
     "execution_count": 18,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZSOeIsgF1tYp"
   },
   "source": [
    "The test set that I produced, using the provided to_sequences function returns 995 sequences.  This is because to_sequences expects 5 inputs (for x) plus one expected valye (for y).  This is actually wasting one value, because we do not need the y-value for evaluation.\n",
    "\n",
    "If you use your own method to generate the test data, you could get 996 rows, because your final x row will include the final value in the dataset (15). \n",
    "\n",
    "For this one I will accept either 995 or 996."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "assignment_10_rows.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
